281 research outputs found
Visualizing and Understanding Convolutional Networks
Large Convolutional Network models have recently demonstrated impressive
classification performance on the ImageNet benchmark. However there is no clear
understanding of why they perform so well, or how they might be improved. In
this paper we address both issues. We introduce a novel visualization technique
that gives insight into the function of intermediate feature layers and the
operation of the classifier. We also perform an ablation study to discover the
performance contribution from different model layers. This enables us to find
model architectures that outperform Krizhevsky \etal on the ImageNet
classification benchmark. We show our ImageNet model generalizes well to other
datasets: when the softmax classifier is retrained, it convincingly beats the
current state-of-the-art results on Caltech-101 and Caltech-256 datasets
Deep Poselets for Human Detection
We address the problem of detecting people in natural scenes using a part
approach based on poselets. We propose a bootstrapping method that allows us to
collect millions of weakly labeled examples for each poselet type. We use these
examples to train a Convolutional Neural Net to discriminate different poselet
types and separate them from the background class. We then use the trained CNN
as a way to represent poselet patches with a Pose Discriminative Feature (PDF)
vector -- a compact 256-dimensional feature vector that is effective at
discriminating pose from appearance. We train the poselet model on top of PDF
features and combine them with object-level CNNs for detection and bounding box
prediction. The resulting model leads to state-of-the-art performance for human
detection on the PASCAL datasets
One-shot learning of object categories
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by maximum likelihood (ML) and maximum a posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully
Understanding Deep Architectures using a Recursive Convolutional Network
A key challenge in designing convolutional network models is sizing them
appropriately. Many factors are involved in these decisions, including number
of layers, feature maps, kernel sizes, etc. Complicating this further is the
fact that each of these influence not only the numbers and dimensions of the
activation units, but also the total number of parameters. In this paper we
focus on assessing the independent contributions of three of these linked
variables: The numbers of layers, feature maps, and parameters. To accomplish
this, we employ a recursive convolutional network whose weights are tied
between layers; this allows us to vary each of the three factors in a
controlled setting. We find that while increasing the numbers of layers and
parameters each have clear benefit, the number of feature maps (and hence
dimensionality of the representation) appears ancillary, and finds most of its
benefit through the introduction of more weights. Our results (i) empirically
confirm the notion that adding layers alone increases computational power,
within the context of convolutional layers, and (ii) suggest that precise
sizing of convolutional feature map dimensions is itself of little concern;
more attention should be paid to the number of parameters in these layers
instead
- …